Pitch synchronized speech processing (PSSP) for speaker recognition
نویسندگان
چکیده
A method for speech signal enhancement is developed with application to automatic speaker recognition where the signals have different channel conditions. The basis of this technique is a robust pitch detection algorithm that accurately estimates the instantaneous pitch rate, and extracts single pitch period speech segments. This technique of pitch synchronized speech processing (PSSP) provides the highest time-frequency resolution for short time Fourier analysis of speech signals. It also effectively eliminates all non-voiced signal regions and minimizes the spectral harmonics due to multiple pitch periods in the analysis window. One significant benefit of PSSP is that feature warping can be applied to the pitch-synchronized spectrums for two cross-channel signals. Feature warping in the spectral domain provides linear channel normalization and enhancement for spectrographic analysis. A cross channel transfer function can then be derived from the feature warping process and applied to audio channel normalization and enhancement. The application of the PSSP feature warping transfer function resulted in improved speaker recognition performance when applied to cross-channel speech signals from the CAVIS voice corpus [1]. However, PSSP alone did not improve recognition performance compared to Mel filterbank cepstral coefficients.
منابع مشابه
A Pitch Detection Algorithm Based on Special Points and Area
Pitch detection and estimation is a very important problem in speech signal processing. Now some scholar has presented a simple and effective method in pitch detection. It lessens the computing burden, but still has some defects for practical application. Here we improve this simple algorithm effectively, and introduce a method based on positive-negative area into it for pitch detection. Its go...
متن کاملCASA based speech separation for robust speech recognition
This paper introduces a speech separation system as a front-end processing step for automatic speech recognition (ASR). It employs computational auditory scene analysis (CASA) to separate the target speech from the interference speech. Specifically, the mixed speech is preprocessed based on auditory peripheral model. Then a pitch tracking is conducted and the dominant pitch is used as a main cu...
متن کاملA Comparative Study of Gender and Age Classification in Speech Signals
Accurate gender classification is useful in speech and speaker recognition as well as speech emotion classification, because a better performance has been reported when separate acoustic models are employed for males and females. Gender classification is also apparent in face recognition, video summarization, human-robot interaction, etc. Although gender classification is rather mature in a...
متن کاملA Snack Implementation and Tcl/Tk Interface to the Fundamental Frequency Variation Spectrum Algorithm
Intonation is an important aspect of vocal production, used for a variety of communicative needs. Its modeling is therefore crucial in many speech understanding systems, particularly those requiring inference of speaker intent in real-time. However, the estimation of pitch, traditionally the first step in intonation modeling, is computationally inconvenient in such scenarios. This is because it...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004